{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "3a9a252f-5e79-4f33-b78c-c3a34e0f1a59",
   "metadata": {
    "tags": []
   },
   "source": [
    "# Extractive Question Answering\n",
    "\n",
    "Language models are good at generating text, but generations are not always accurate. One way to increase accuracy is to provide context for answering a question within the prompt.\n",
    "\n",
    "First, we'll load some context."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "8d925dca-a4f0-4770-a217-fbd2da492434",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Python is a high-level, general-purpose programming language. Its design philosophy emphasizes code readability with the use of significant indentation via the off-side rule.Python is dynamically typed and garbage-collected. It supports multiple programming paradigms, including structured (particularly procedural), object-oriented and functional programming. It is often described as a \"batteries included\" language due to its comprehensive standard library.Guido van Rossum began working on Python in the late 1980s as a successor to the ABC programming language and first released it in 1991 as Python 0.9.0. Python 2.0 was released in 2000. Python 3.0, released in 2008, was a major revision not completely backward-compatible with earlier versions. Python 2.7.18, released in 2020, was the last release of Python 2.Python consistently ranks as one of the most popular programming languages.\n"
     ]
    }
   ],
   "source": [
    "import languagemodels as lm\n",
    "\n",
    "python_info = lm.get_wiki(\"Python\")\n",
    "print(python_info)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4a5830e4-5efc-4d3f-9452-c2f44d5c2e51",
   "metadata": {},
   "source": [
    "We can now prompt the model to answer the question using the context."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "22ae1c03-7168-42db-a0f8-4057c79fd91d",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Guido van Rossum created Python.\n"
     ]
    }
   ],
   "source": [
    "print(lm.do(f\"Answer from the context: Who created Python? {python_info}\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e1bed464-cbab-41e2-90d6-2beb02f97c31",
   "metadata": {
    "tags": []
   },
   "source": [
    "# Embeddings\n",
    "\n",
    "Language models are capable of answering questions based on a context, but we now need a way to provide them with appropriate context.\n",
    "\n",
    "One solution to this is to have a large amount of available context and retrieve only the meaningful bits when answering a question. Embeddings are a tool to achieve this.\n",
    "\n",
    "Embeddings provide a way to map a numeric vector to the meaning of some input. In the case of language models, embeddings are derived from documents.\n",
    "\n",
    "## Semantic Search\n",
    "\n",
    "Once we have mapped vectors to our documents, we can search for similar documents by meaning. If we've constructed our embedding model appropriately, documents that answer questions will be near the questions themselves in vector space.\n",
    "\n",
    "The math to achieve that is out of scope of this example, but the languagemodels package provides a few simple helper functions to facilated a document store capable of semantic search."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "b5816039-7071-4faf-be40-2451c265945a",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Load some programming language documents\n",
    "for topic in ['Python', 'Javascript', 'C++', 'SQL', 'HTML']:\n",
    "    doc = lm.get_wiki(topic)\n",
    "    lm.store_doc(doc)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "3412a004-2505-44ae-92b2-13e71ff6bcb9",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'It is often described as a \"batteries included\" language due to its comprehensive standard library.Guido van Rossum began working on Python in the late 1980s as a successor to the ABC programming language and first released it in 1991 as Python 0.9.\\n\\nPython is a high-level, general-purpose programming language. Its design philosophy emphasizes code readability with the use of significant indentation via the off-side rule.\\n\\n18, released in 2020, was the last release of Python 2.Python consistently ranks as one of the most popular programming languages.'"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Perform semantic search\n",
    "lm.get_doc_context(\"Who created Python?\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "8442908a-2527-4279-be8e-55b6a64c2522",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'JavaScript is often associated with HTML and CSS.'"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Put everything together to answer a general question about one of the languages\n",
    "question = \"What technologies are often associated with JS?\"\n",
    "\n",
    "context = lm.get_doc_context(question)\n",
    "\n",
    "lm.do(f\"Answer from the context: {question} {context}\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
